第3课导读：应对非线性分类问题

我们正在突破线性模型的局限，这些模型难以对无法用直线分离的数据进行分类。今天，我们将使用 PyTorch 工作流构建一个深度神经网络（DNN）能够学习复杂且非线性的决策边界对真实世界分类任务至关重要。

1. 可视化非线性数据的必要性

我们的第一步是创建一个具有挑战性的合成数据集，例如双月分布，以直观展示为何简单线性模型会失效。这种设置迫使我们使用深层架构来逼近类间分隔所需的复杂曲线。

非线性激活函数的力量

深度神经网络的核心原理是通过 ReLU 等函数在隐藏层中引入非线性。 ReLU如果没有这些非线性激活函数，无论层数多深，堆叠层最终都只会等效于一个大型线性模型。

TERMINALbash — classification-env

> Ready. Click "Run" to execute.

TENSOR INSPECTOR Live

Run code to inspect active tensors

Question 1

What is the primary purpose of the ReLU activation function in a hidden layer?

Introduce non-linearity so deep architectures can model curves

Speed up matrix multiplication

Ensure the output remains between 0 and 1

Normalize the layer output to a mean of zero

Question 2

Which activation function is required in the output layer for a binary classification task?

Sigmoid

Softmax

ReLU

Question 3

Which loss function corresponds directly to a binary classification problem using a Sigmoid output?

Binary Cross Entropy Loss (BCE)

Mean Squared Error (MSE)

Cross Entropy Loss

Challenge: Designing the Core Architecture

Integrating architectural components for non-linear learning.

You must build a nn.Module for the two-moons task. Input features: 2. Output classes: 1 (probability).

Step 1

Describe the flow of computation for a single hidden layer in this DNN.

Solution:
Input $\to$ Linear Layer (Weight Matrix) $\to$ ReLU Activation $\to$ Output to Next Layer.

Step 2

What must the final layer size be if the input shape is $(N, 2)$ and we use BCE loss?

Solution:
The output layer must have size $(N, 1)$ to produce a single probability score per sample, matching the label shape.